Feature-Rich Error Detection in Scientific Writing Using Logistic Regression

نویسندگان

  • Madeline Remse
  • Mohsen Mesgar
  • Michael Strube
چکیده

The goal of the Automatic Evaluation of Scientific Writing (AESW) Shared Task 2016 is to identify sentences in scientific articles which need editing to improve their correctness and readability or to make them better fit within the genre at hand. We encode many different types of errors occurring in the dataset by linguistic features. We use logistic regression to assign a probability indicating whether a sentence needs to be edited. We participate in both tracks at AESW 2016: binary prediction and probabilistic estimation. In the former track, our model (HITS) gets the fifth place and in the latter one, it ranks first according to the evaluation metric.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Phishing website detection using weighted feature line embedding

The aim of phishing is tracing the users' s private information without their permission by designing a new website which mimics the trusted website. The specialists of information technology do not agree on a unique definition for the discriminative features that characterizes the phishing websites. Therefore, the number of reliable training samples in phishing detection problems is limited. M...

متن کامل

Feature-Rich Two-Stage Logistic Regression for Monolingual Alignment

Monolingual alignment is the task of pairing semantically similar units from two pieces of text. We report a top-performing supervised aligner that operates on short text snippets. We employ a large feature set to (1) encode similarities among semantic units (words and named entities) in context, and (2) address cooperation and competition for alignment among units in the same snippet. These fe...

متن کامل

Feature-based Malicious URL and Attack Type Detection Using Multi-class Classification

Nowadays, malicious URLs are the common threat to the businesses, social networks, net-banking etc. Existing approaches have focused on binary detection i.e. either the URL is malicious or benign. Very few literature is found which focused on the detection of malicious URLs and their attack types. Hence, it becomes necessary to know the attack type and adopt an effective countermeasure. This pa...

متن کامل

Hidden logistic linear regression for support vector machine based phone verification

Phone verification approach to mispronunciation detection using a combination of Neural Network (NN) and Support Vector Machine (SVM) has been shown to yield improved verification performance. This approach uses a NN to predict the HMM state posterior probabilities. The average posterior probability vectors computed over each phone segment are used as input features to a SVM back-end to generat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016